The goal of this problem set is to develop some intuition about the impact of the number of nodes in the hidden layer of the neural network. We will use few simulated examples to have clear understanding of the structure of the data we are modeling and will assess how performance of the neural network model is impacted by the structure in the data and the setup of the network.
First of all, to compensate for lack of coverage on this topic in ISLR, let’s go over a couple of simple examples. We start with simulating a simple two class dataset in 2D predictor space with an outcome representative of an interaction between attributes. (Please notice that for the problems you will be working on this week you will be asked below to simulate a dataset using a different model.)
# fix seed so that narrative always matches the plots:
set.seed(1234567890)
nObs <- 1000
ctrPos <- 2
xyTmp <- matrix(rnorm(4*nObs), ncol = 2)
xyCtrsTmp <- matrix(sample(c( -1, 1)*ctrPos, nObs*4, replace = TRUE), ncol = 2)
xyTmp <- xyTmp + xyCtrsTmp
gTmp <- paste0("class", (1 + sign(apply(xyCtrsTmp, 1, prod)))/2)
plot(xyTmp, col = as.numeric(factor(gTmp)), pch = as.numeric(factor(gTmp)), xlab = "X1", ylab = "X2")
abline(h = 0)
abline(v = 0)
Symbol color and shape indicate class. Typical problem that will present a problem for any approach estimating a single linear decision boundary. We used similar simulated data for the random forest (week 10) problem set.
Simulate data with n = 1000 observations and p = 3 covariates – all random variables from uniform distribution contained on \([ -1, 1]\) interval: \(X_i\sim \mathcal{U}( -1, 1), i = \{1, 2, 3\}\). Resulting observations should be uniformly filling in a 3D cube centered at \((0, 0, 0)\) with the edge of length \(2\). Create two category class variable assigning all observations within a smaller cube – also centered at \((0, 0, 0)\) with the edge length of \(2^\frac{2}{3}\) – to one class category and observations outside of this cube – to the second class category.
Please note that this dataset is entirely different from the one used in the preface – you will need to write code simulating it on your own. Somewhat related 2D dataset was used as a motivational example at week 12 (SVM) lecture before introducing kernels and SVMs. However, the example in the lecture was in 2D (three-dimensional problem here), had spheric boundary (here we work with nested cubes) and used normal (here uniform) distribution. Since you will be reusing this code in the following two problems it is probably best to turn this procedure into a function with appropriate parameters.
Ten points available for this problem are composed of accomplishing the following tasks:
- Correct implementation of the function generating data as prescribed above (2 points)
library(scatterplot3d)
set.seed (1)
nObs <- 1000
x <- runif(nObs, -1, 1)
y <- runif(nObs, -1, 1)
z <- runif(nObs, -1, 1)
# The inner cube is
myBound <- 2^(2/3) * 0.5
twoClassesIn3D <- cbind(x,y,z, (x > myBound | y > myBound | z > myBound | x < -myBound | y < -myBound | z < -myBound ));
- Check and demonstrate numerically that the resulting class assignment splits these observations, subject to sampling variability, evenly between these two groups (2 points)
- Explain why this is achieved by using \(2^\frac{2}{3}\) as the length of the inner cube edge that defines true decision boundary (2 points)
print("Count of points in class 1 divided by count of points in the other class:")
## [1] "Count of points in class 1 divided by count of points in the other class:"
sum(twoClassesIn3D[,4]) / length(twoClassesIn3D[,4])
## [1] 0.513
print("Volume of inner cube = the cube of the shape's edge length:")
## [1] "Volume of inner cube = the cube of the shape's edge length:"
(2^(2/3))^3
## [1] 4
print("Volume of the remaining space is the total outer cube volume (edge=2) minus the inner cube volume (edge=2^(2/3)):")
## [1] "Volume of the remaining space is the total outer cube volume (edge=2) minus the inner cube volume (edge=2^(2/3)):"
(2^3) - (2^(2/3))^3
## [1] 4
As we can see in the above result, classes are split about 50-50. This is because the volume of the two spaces is equal (4 cubic units).
- Plot values of the resulting covariates projected at each pair of the axes indicating classes to which observations belong with symbol color and/or shape (you can use function
pairs, for example) (2 points)
scatterplot3d(twoClassesIn3D[,1],
twoClassesIn3D[,2],
twoClassesIn3D[,3],
xlab = "x",
ylab = "y",
zlab = "z",
main = "Two classes uniformly distributed in three dimensions",
# FIXME: LaTeX support in R plots?
#sub = "red points are inside the $2^\frac{2}{3}$ cube at origin",
sub = "Red points are inside the cube of length 2^0.6666 at origin",
color = ifelse(twoClassesIn3D[,4], 'red', 'black'))
- Reflect on the geometry of this problem by answering the following question: what is the smallest number of planes in 3D space that would completely enclose points from the “inner” class?
Four planes – four triangular faces create an enclosure.
Is this number equal to the number of cube faces or is it something smaller? Larger? Please note that “enclose” above does not mean “perfectly discriminate between the points assigned to two classes” (2 points)
Smaller than the face count of a cube. Cube has six faces.
For the dataset simulated above fit neural networks with 1 through 6 nodes in a single hidden layer (use
neuralnetimplementation). For each of them calculate training error (see an example in Preface where it was calculated usingerr.fctfield in the result returned byneuralnet).
# Get training and test data from wifiLocDat list
getTestData <- function(nObsArg = 1000) {
nObs <- nObsArg
x <- runif(nObs, -1, 1)
y <- runif(nObs, -1, 1)
z <- runif(nObs, -1, 1)
# The inner cube boundary
myBound <- 2^(2/3) * 0.5
twoClassesIn3D <- cbind(x,y,z, (x > myBound | y > myBound | z > myBound | x < -myBound | y < -myBound | z < -myBound ));
colnames(twoClassesIn3D) <- c("x", "y", "z", "outcome")
# Get random indices
randomIndices <- sample(1:nObs, floor(nObs/2))
# Get indices not in randomIndices
randomIndicesInverse <- setdiff(seq(1:nObs), randomIndices)
xyzTrain <- twoClassesIn3D[ randomIndices, 1:3]
xyzTest <- twoClassesIn3D[-randomIndices, 1:3]
classTrain <- twoClassesIn3D[ randomIndices, 4]
classTest <- twoClassesIn3D[-randomIndices, 4]
return ( list(
"xyzTest" = xyzTest,
"xyzTrain" = xyzTrain,
"classTest" = classTest,
"classTrain" = classTrain
))
}
gtd <- getTestData()
xyzTest = gtd$xyzTest
xyzTrain = gtd$xyzTrain
classTest = gtd$classTest
classTrain = gtd$classTrain
computeErrorRate <- function (myTable) {
return (1 - sum(diag(myTable)) / sum(myTable))
}
getNnTmpTable <- function(xyzTest, xyzTrain, classTest, classTrain, hiddenArg) {
nnRes <- neuralnet(outcome ~ ., cbind(xyzTrain, outcome = classTrain), hidden = c(hiddenArg), rep = 1, stepmax=1e+06, threshold = 0.1, linear.output = FALSE, err.fct = 'ce')
nnTmpTbl <- table(classTest, round(predict(nnRes, newdata = xyzTest)))
return (nnTmpTbl)
}
myHeights <- c()
myHiddenLayerNodeCounts <- 1:6
for (hA in myHiddenLayerNodeCounts) {
nnTable <- getNnTmpTable(xyzTest, xyzTrain, classTest, classTrain, hiddenArg = hA)
print(paste("Number of nodes in hidden layer: ", hA))
newHeight <- computeErrorRate(nnTable)
print(paste("Error rate: ", newHeight))
myHeights <- c(myHeights, newHeight)
}
## [1] "Number of nodes in hidden layer: 1"
## [1] "Error rate: 0.416"
## [1] "Number of nodes in hidden layer: 2"
## [1] "Error rate: 0.3"
## [1] "Number of nodes in hidden layer: 3"
## [1] "Error rate: 0.24"
## [1] "Number of nodes in hidden layer: 4"
## [1] "Error rate: 0.084"
## [1] "Number of nodes in hidden layer: 5"
## [1] "Error rate: 0.204"
## Warning: Algorithm did not converge in 1 of 1 repetition(s) within the
## stepmax.
## Error in cbind(1, pred) %*% weights[[num_hidden_layers + 1]]: requires numeric/complex matrix/vector arguments
# Plot the error rates for each hidden layer node count
barplot(height = myHeights,
names.arg = myHiddenLayerNodeCounts,
col = rainbow(length(myHeights)),
xlab = "Number of nodes in hidden layer",
ylab = "Error rate")
## Error in barplot.default(height = myHeights, names.arg = myHiddenLayerNodeCounts, : incorrect number of names
# Classification (so linear.output = TRUE), not regression
# for (hiddenLayerNode in 1:6) {
# for (hiddenLayerNode in 1:6) {
# nnRes <- neuralnet(outcome ~ ., twoClassesIn3D, hidden = c(hiddenLayerNode), linear.output = FALSE)
# print(nnRes)
# plot(nnRes)
# }
Simulate another independent dataset (with n = 10,000 observations to make resulting test error estimates less variable) using the same procedure as above (3D, two classes as nested cubes) and use it to calculate the test error at each number of hidden nodes. Plot training and test errors as function of the number of nodes in the hidden layer. What does resulting plot tells you about the interplay between model error, model complexity and problem geometry? What is the geometrical interpretation of this error behavior?
gtd <- getTestData(10000)
xyzTest = gtd$xyzTest
xyzTrain = gtd$xyzTrain
classTest = gtd$classTest
classTrain = gtd$classTrain
for (hA in 1:6) {
nnTable <- getNnTmpTable(xyzTest, xyzTrain, classTest, classTrain, hiddenArg = hA)
print(paste("Number of nodes in hidden layer: ", hA))
print(paste("Error rate: ", computeErrorRate(nnTable)))
}
## [1] "Number of nodes in hidden layer: 1"
## [1] "Error rate: 0.4034"
## [1] "Number of nodes in hidden layer: 2"
## [1] "Error rate: 0.3008"
## [1] "Number of nodes in hidden layer: 3"
## [1] "Error rate: 0.3032"
## [1] "Number of nodes in hidden layer: 4"
## [1] "Error rate: 0.1562"
## Warning: Algorithm did not converge in 1 of 1 repetition(s) within the
## stepmax.
## Error in cbind(1, pred) %*% weights[[num_hidden_layers + 1]]: requires numeric/complex matrix/vector arguments
Setup a simulation repeating procedure described above for n = 100, 200 and 500 observations in the training set as well as adding none, 1, 2 and 5 null variables to the training and test data (and to the covariates in formula provided to
neuralnet). Draw values for null variables from uniform distribution on \([ -1, 1]\) as well and do not use them in the assignment of the observations to the class category (e.g.x <- matrix(runif(600, -1, 1), ncol = 6); cl <- as.numeric(factor(rowSums(abs(x[, 1:3])<2^( -1/3)) = =3))creates dataset with three informative and three null variables). Repeat calculation of training and test errors at least several times for each combination of sample size, number of null variables and size of the hidden layer simulating new training and test dataset every time to assess variability in those estimates. Present resulting error rates so that the effects of sample size and fraction of null variables can be discerned and discuss their impact of the resulting model fits.
Use
neuralnetto model the outcome in WiFi localization dataset that we used in previous weeks using outcome in its original, four-levels format. Yourneuralnetmodels will need to have four output nodes to represent outcome with four levels. Obtain training and test error for this model for several sizes of the hidden layer. Compare resulting test error in predicting Location 3 to that observed for random forest, SVM and KNN approaches earlier in the course.
For reproducibility purposes it is always a good idea to capture the state of the environment that was used to generate the results:
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.14.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scatterplot3d_0.3-41 ggplot2_3.2.1 neuralnet_1.44.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 knitr_1.24 magrittr_1.5 tidyselect_0.2.5
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.0 rlang_0.4.1
## [9] stringr_1.4.0 dplyr_0.8.3 tools_3.5.1 gtable_0.3.0
## [13] xfun_0.9 withr_2.1.2 htmltools_0.4.0 assertthat_0.2.1
## [17] yaml_2.2.0 lazyeval_0.2.2 digest_0.6.22 tibble_2.1.3
## [21] crayon_1.3.4 purrr_0.3.2 glue_1.3.1 evaluate_0.14
## [25] rmarkdown_1.15 stringi_1.4.3 compiler_3.5.1 pillar_1.4.2
## [29] scales_1.0.0 pkgconfig_2.0.3
The time it took to knit this file from beginning to end is about (seconds):
proc.time() - ptStart
## user system elapsed
## 1697.848 732.501 2196.329